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(57) Abstract 

In a motion or depth estimation method, a set (CS) of candidate motion vectors or depth values formed by already obtained motion 
vectors or depth values for neighboring image parts which are spatio-temporally adjacent to a given image part of interest, is generated 
(MV-MEM) for the given image part of interest. TTiose candidate motion vectors or depth values which correspond to neighboring image 
parts containing more reliable texture information than other neighboring image parts, are prioritized (TBPD) to obtain a prioritized set 
(PCS) of candidate motion vectors or depth values. Thereafter, motion or depth data (MV) for the given image part of interest is furnished 
(ME) in dependence upon the prioritized set (PCS) of candidate motion vectors or depth values. Finally, an image signal processing device 
(ISPD) processes an image signal (Iin) to obtain an enhanced image signal in dependence upon the motion or depth data (MV). 
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The invention relates to motion or depth estimation. 

Block matching has been used with success in video coding applications [3][4] 
for displacement estimation and movement compensation. Block matching algorithms are 
iterative minimization algorithms which assume that all pixels within a given block move 
uniformly, say with vector (ij). If for that block we minimize the Mean Squared Error (MSE) 
with respect to (ij), we can find after convergence, the most likely motion for that block from 
timet to HI. 

^&J>—YY\U t « M-U t (m+i,n + j) f (1) 

Here M,N are the dimensions of the block in pixels. Ut(m,n) is the pixel intensity of a scene at 
time t, at the location (m,n). The (ij)'s are taken from a set of candidates: CS. The minimal 
value of MSE over CS is called the matching penalty (MP). 

In a method proposed by G. de Haan in [5], which we refer in this text as the 
standard solution or the original algorithm, the candidate set consists of values taken from a 
given arrangement of neighbors. This known arrangement is optimized to enable an efficient 
hardware implementation. 

It is, inter alia, an object of the invention to provide a better motion or depth 
estimation. To this end, a first aspect of the invention provides a motion-estimation method 
and device as defined by claims 1 and 8. A second aspect of the invention provides methods of 
and a device for extracting depth information from motion as defined by claims 6, 7 and 9. A 
third aspect of the invention provides an image display apparatus as defined by claim 10. 
Advantageous embodiments are defined in the dependent claims. 

In a motion or depth estimation method in accordance with a primary aspect of 
the invention, a set of candidate motion vectors or depth values formed by already obtained 



WO 99/40726 



PCT/1B99/00162 



motion vectors or depth values for neighboring image parts which are spatio-temporally 
adjacent to a given image part of interest, is generated for the given image part of interest. 
Those candidate motion vectors or depth values which correspond to neighboring image parts 
containing more reliable texture information than other neighboring image parts, are 
5 prioritized to obtain a prioritized set of candidate motion vectors or depth values. Thereafter, 
motion or depth data for the given image part of interest is furnished in dependence upon the 
prioritized set of candidate motion vectors or depth values. This can be done in the prior art 
manner by selecting that candidate motion vector or depth value which results in the lowest 
match error for the given image part of interest, possibly after adjusting the candidate motion 

10 vector or depth values set (by adding additional candidate motion vectors or depth values 
obtained) by adding small updates to the existing candidate motion vectors or depth values. 
Finally, an image signal processing device processes an image signal to obtain an enhanced 
image signal in dependence upon the motion or depth data. 

These and other aspects of the invention will be apparent from and elucidated 

15 with reference to the embodiments described hereinafter. 

The drawing shows an embodiment of an image display apparatus in 
accordance with the present invention. 

20 

In this description, we propose a block matching algorithm which differentiates 
between blocks depending on the reliability of the texture information. We present block 
matching techniques in the domain of motion estimation, and we mention how these can be 

25 applied to depth reconstruction. Next, we introduce the notion of confidence and explain how 
informational entropy and matching quality in concert can lead to reliable block matching. 
This idea is an alternative to different reliability measures expressing the estimation 
depth/motion vectors quality [6][7]. Further, our new Confidence Based Block Matching 
(CBBM) algorithm is introduced. Finally, experimental results illustrating the benefit of our 

30 method are given. 

Our algorithm (which is implemented in software) is a modification of the 
standard solution [5] in the following respects: 
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• We keep the notion of a small CS rather than a full area search in order to arrive 
at efficient convergence. However in our algorithm the shape of the CS will not be constant 
but it will be based on the notion of confidence as explained below. 

• In an application of the present invention to depth estimation, the values in the 
5 CS will be depth values rather than motion vectors (ij); however for a given camera motion a 

depth value can be uniquely translated into its associated apparent motion. 

The accuracy of the motion or depth estimation depends on the quality of the 
underlying texture. The texture in some regions of the image may be low. It is not possible to 

10 give a high confidence to the motion or depth values obtained in such regions for obvious 

reasons. As a consequence, regions equidistant to the viewer could result in different values of 
the depth estimates. To avoid this, the propagation of the motion or depth value should be 
controlled e.g. by the assumed quality of the motion or depth estimation. This is especially 
relevant for the case of adjacent blocks with very different texture qualities. We propose to 

15 implement this idea by introducing a quantity that we will call confidence. In a preferred 
embodiment, for a given block the confidence C(t) at time t is 

where Ent is the informational entropy [8]. For the first iteration MP(t) is given a constant 
value for all blocks. The informational entropy Ent is given by: 

NG-\ NG-\ 

20 ^g(x,y)*Logg{x,y) (3) 

where the g(x,y) are the elements of a Gray Level Co-occurrence Matrix, and NG is the 
number of gray levels in the image. The Co-occurrence matrix expresses the probability of the 
co-occurrences of certain pixel pairs [8]. Instead of informational entropy as defined here, 
spatial variance can be used. 



25 



The confidence could also be influenced by other image features. For example, 
the use of an edge map, and the a posteriori motion or depth error computation were 
considered, but have not led to any improvements. 



30 



Here we explain how the notion of confidence C(t) is used to obtain candidate 
sets CS that take into account the local differences in match quality. One iteration of the 
CBBM algorithm comprises the following steps. First, make CS contain the values in the 
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block of interest plus the values in its 8 neighbors. Then, the values in the K blocks with 
lowest C(t) are removed from CS, 0 < K < 8. As in the standard solution [5], we next extend 
CS with one random value and for the final 9-K+l values in the prioritized candidate set PCS 
the optimal candidate (according to MSE) is selected. Note that the central block, which is 
5 contained in the initial CS is not necessarily included in the final set. 

We observe that for K less than 5 our prioritized candidate set PCS is larger 
than in the original algorithm [5], but it is much better adjusted to the local match quality. 

Experiments have been done using a doll-house sequence. Interlaced frames are 
taken into account. Iterations of the CBBM algorithm lead to the convergence of the matching 
10 error, which is the sum of MP for all blocks, and to the convergence of the motion or depth 
estimates. 

Modifying K influences the convergence of the algorithm. In general for all 
values of K (0..8) the number of required iterations is smaller than in the original algorithm 
[5]. We need 2 to 3 iterations while the standard solution needed 5 to 8. In order to get this fast 

1 5 convergence the visiting/updating order of the blocks is important. A strategy where the blocks 
are updated in the order of decreasing confidence turns out to work best. Further, if K=0 
(ignoring confidence) a single iteration step takes longer. The result is stable, and convergence 
is monotone. If K increases, a single iteration step is faster, but the matching error and the 
motion or depth assignments may display oscillatory convergence. The best results are 

20 obtained for a candidate set consisting of 4 blocks (3 neighbors + a random modified block). 
We observe that the CBBM algorithm gives less noisy motion or depth values. 

A preferred application of this invention addresses the problem of the extraction 
of depth from motion. From a video image sequence depth information is extracted using an 

25 iterative block matching algorithm. The accuracy and the convergence speed of this algorithm 
are increased by introducing Confidence Based Block Matching (CBBM) as explained above. 
This new method is based on prioritizing blocks containing more reliable texture information. 

The following problem is considered: given a video sequence of a static scene 
taken by a camera with known motion, depth information should be recovered. All apparent 

30 motion in the video sequence results from parallax. Differences in motion between one region 
and another indicate a depth difference. Indeed, analyzing two consecutive frames, the parallax 
between a given picture region at time t and the same region at t+1 can be computed. This 
parallax corresponds to the motion of different parts of the scene. Objects in the foreground 
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move more than those in the background. By applying geometrical relations, the depth 
information can be deduced from the motion. 

All surfaces in the scene are assumed to be non-specular reflective, and the 
illumination conditions are assumed to be approximately constant. The texture is supposed to 
5 be sufficiently rich. 

This problem has received ample attention in the literature: in particular both 
feature based [1] and block based [2] techniques have been proposed. The advantage of the 
feature based methods is that they can cope with a large frame-to- frame camera displacements, 
however they require the solution of the feature matching problem, which is computationally 
10 complex. To avoid feature matching we will focus on block-based techniques. 

A new block matching algorithm for the depth from motion depth estimation 
was proposed. The CBBM method is based on prioritizing blocks containing more reliable 
texture information. It is proposed to attribute to each block a confidence measure depending 
on the matching quality and the informational entropy. The accuracy of the motion or depth 
15 estimation and the convergence speed of the algorithm are better than the standard solution. 

The drawing shows an embodiment of an image display apparatus in 
accordance with the present invention. An image signal Iin is applied to an image signal 
processing device ISPD to obtain an enhanced image signal having an increased line and/or 

20 field rate. The enhanced image signal is displayed on a display device DD. The processing 
effected by the an image signal processing device ISPD depends on motion vectors MV 
generated by a motion estimator ME. In one embodiment, the image signal processing device 
ISPD generates depth information from the motion vectors MV, and processes the image 
signal Iin in dependence on the depth information. 

25 The image signal Iin is also applied to the motion estimator ME and to a texture 

extraction device TE. An output of the texture extraction device TE is coupled to a texture 
memory TM. Motion vectors MV estimated by the motion estimator ME are applied to a 
motion vector memory MV-MEM to supply a set CS of candidate motion vectors. A texture 
dependent prioritizing device TBPD prioritizes the set CS of candidate motion vectors to 

30 obtain a prioritized set PCS of candidate motion vectors. The motion estimator ME estimates 
the motion vectors MV on the basis of the image signal Iin and the prioritized set PCS of 
candidate motion vectors. 
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A farther aspect of the invention provides a motion estimating method 
comprising the steps of: 

. generating a set PCS of N-K candidate motion vectors containing motion vectors for a block 
of interest and N-l blocks adjacent to said block of interest excluding motion vectors for K 
5 blocks having a lower confidence quantity C(t) than the N-K other blocks; and 

. furnishing motion information in dependence on said set PCS of N-K motion vectors. 

Preferably, said motion information furnishing step includes the steps of: 
. adding a random value to said set PCS of N-K motion vectors; and 
. selecting an optimal candidate from said set of N-K+l motion vectors. 
10 Preferably, blocks are updated in the order of decreasing confidence quantity, 

whereby blocks containing more reliable texture information are prioritized. 

Preferably, said confidence quantity C(t) depends on matching quality and/or 
informational entropy. 

Yet another aspect of the invention provides a method of extracting depth 
15 information from motion, the method comprising the steps of: 

. estimating motion as defined in the previous aspect of the invention; and 
. generating depth information from the motion information. 

It should be noted that the above-mentioned embodiments illustrate rather than 
20 limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. The notions 
"neighboring" and "adjacent" are not limited to "directly neighboring" and "directly adjacent", 
respectively; it is possible that there are image parts positioned between a given image part of 
interest and a neighboring image part. The notion "already obtained motion vectors for 
25 neighboring image parts which are spatio-temporally adjacent to an image part of interest" 
includes motion vectors obtained during a previous field or frame period. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word "comprising" does not exclude the presence of other elements or steps than those listed 
in a claim. The invention can be implemented by means of hardware comprising several 
30 distinct elements, and by means of a suitably programmed computer. In the device claim 

enumerating several means, several of these means can be embodied by one and the same item 
of hardware. 
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CLAIMS: 



1 . A motion estimation method, comprising: 

generating (MV-MEM) for a given image part of interest, a set (CS) of 
candidate motion vectors formed by already obtained motion vectors for neighboring image 
parts which are spatio-temporally adjacent to said given image part of interest; 
5 prioritizing (TBPD) those candidate motion vectors which correspond to 

neighboring image parts containing more reliable texture information than other neighboring 
image parts, to obtain a prioritized set (PCS) of candidate motion vectors; and 

furnishing (ME) motion data (MV) for said given image part of interest in 
dependence upon said prioritized set (PCS) of candidate motion vectors. 

10 

2. A motion estimating method as claimed in claim 1 , wherein 

said prioritizing step (TBPD) comprises generating a set (PCS) of N-K 
candidate motion vectors containing motion vectors for a block of interest and N-l blocks 
spatio-temporally adjacent to said block of interest excluding motion vectors for K blocks 
15 having a lower confidence quantity than the N-K other blocks; and 

said motion data furnishing step (ME) comprises furnishing motion information 
in dependence on said set (PCS) of N-K motion vectors. 



3. A motion estimating method as claimed in claim 2, wherein said motion 

20 information furnishing step includes: 

adding a random value to said set (PCS) of N-K motion vectors; and 
selecting an optimal candidate from said set of N-K+l motion vectors. 



4. A motion estimating method as claimed in claim 2, wherein blocks are updated 

25 in the order of decreasing confidence quantity. 



5. A motion estimating method as claimed in claim 2, wherein said confidence 

quantity depends on matching quality and/or informational entropy. 



WO 99/40726 



9 



PCT/IB99/00162 



6. A method of extracting depth information from motion, the method comprising: 
estimating (ME, MV-MEM, TBPD) motion data (MV) as claimed in claim 1; 

and 

generating (ISPD) depth information from the motion data (MV). 

5 

7. A depth estimation method, comprising: 

generating (MV-MEM) for a given image part of interest, a set (CS) of 
candidate depth values formed by already obtained depth values for neighboring image parts 
which are spatio-temporally adjacent to said given image part of interest; 
10 prioritizing (TBPD) those candidate depth values which correspond to 

neighboring image parts containing more reliable texture information than other neighboring 
image parts, to obtain a prioritized set (PCS) of candidate depth values; and 

furnishing (ME) depth data (MV) for said given image part of interest in 
dependence upon said prioritized set (PCS) of candidate depth values. 

15 

8. A motion estimation device, comprising: 

means (MV-MEM) for generating for a given image part of interest, a set (CS) 
of candidate motion vectors formed by already obtained motion vectors for neighboring image 
parts which are spatio-temporally adjacent to said given image part of interest; 
20 means (TBPD) for prioritizing those candidate motion vectors which 

correspond to neighboring image parts containing more reliable texture information than other 
neighboring image parts, to obtain a prioritized set (PCS) of candidate motion vectors; and 

means (ME) for furnishing motion data (MV) for said given image part of 
interest in dependence upon said prioritized set (PCS) of candidate motion vectors. 

25 

9. A depth estimation device, comprising: 

means for generating (MV-MEM) for a given image part of interest, a set (CS) 
of candidate depth values formed by already obtained depth values for neighboring image 
parts which are spatio-temporally adjacent to said given image part of interest; 
30 means for prioritizing (TBPD) those candidate depth values which correspond 

to neighboring image parts containing more reliable texture information than other 
neighboring image parts, to obtain a prioritized set (PCS) of candidate depth values; and 

means for furnishing (ME) depth data (MV) for said given image part of interest 
in dependence upon said prioritized set (PCS) of candidate depth values. 
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10. An image display apparatus, comprising: 

a motion or depth estimation device (ME, MV-MEM, TBPD) as claimed in 
claim 8 or 9 to furnish motion or depth data (MV); 

an image signal processing device (ISPD) for processing an image signal (Iin) 
to obtain an enhanced image signal in dependence upon said motion or depth data (MV); and 

a display device (DD) for displaying the enhanced image signal. 
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